fix: add deduplication for episodic/event_log write and foresight expiry cleanup#102
Open
r266-tech wants to merge 1 commit intoEverMind-AI:mainfrom
Open
fix: add deduplication for episodic/event_log write and foresight expiry cleanup#102r266-tech wants to merge 1 commit intoEverMind-AI:mainfrom
r266-tech wants to merge 1 commit intoEverMind-AI:mainfrom
Conversation
…iry cleanup Closes EverMind-AI#95 ## Changes ### 1. Delete-before-insert dedup in save_memory_docs() - For episodic_memory: before inserting, delete existing records with the same parent_id from MongoDB, Elasticsearch, and Milvus - For event_log: same delete-before-insert by parent_id across all stores - Dedup is best-effort: failures are logged as warnings but do not block insert ### 2. Foresight expiry cleanup - New cleanup_expired_foresights() function that removes ForesightRecords where end_time < today from all three stores (MongoDB, ES, Milvus) - Can be called periodically (e.g., via cron/scheduler) to keep storage lean ### 3. New delete_by_parent_id on EpisodicMemoryRawRepository - Added missing method to delete episodic memories by parent_id (EventLogRecordRawRepository already had this method) ### 4. Tests - tests/test_write_pipeline_dedup.py covers dedup and cleanup with mocked repos
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #95 — Memory write pipeline: add deduplication for episodic/event_log and expiry cleanup for foresight.
Problem
Duplicate records: When the same MemCell is processed more than once, both
episodic_memoryandevent_logcollections accumulate duplicate entries with the sameparent_id. This degrades retrieval ranking quality.Stale foresight:
ForesightRecordhas a validity window (start_time/end_time), but expired records are never deleted — they just accumulate dead data across MongoDB, Elasticsearch, and Milvus.Changes
1. Delete-before-insert dedup in
save_memory_docs()In
src/biz_layer/mem_memorize.py, before inserting new docs:parent_idfrom MongoDB, ES, and Milvusparent_idacross all three stores2. Foresight expiry cleanup
New
cleanup_expired_foresights()function that:end_time < today3. New
delete_by_parent_idonEpisodicMemoryRawRepositoryAdded the missing method (
EventLogRecordRawRepositoryalready had this).4. Tests
tests/test_write_pipeline_dedup.pycovers: